194 research outputs found
Using Topic Models to Mine Everyday Object Usage Routines Through Connected IoT Sensors
With the tremendous progress in sensing and IoT infrastructure, it is
foreseeable that IoT systems will soon be available for commercial markets,
such as in people's homes. In this paper, we present a deployment study using
sensors attached to household objects to capture the resourcefulness of three
individuals. The concept of resourcefulness highlights the ability of humans to
repurpose objects spontaneously for a different use case than was initially
intended. It is a crucial element for human health and wellbeing, which is of
great interest for various aspects of HCI and design research. Traditionally,
resourcefulness is captured through ethnographic practice. Ethnography can only
provide sparse and often short duration observations of human experience, often
relying on participants being aware of and remembering behaviours or thoughts
they need to report on. Our hypothesis is that resourcefulness can also be
captured through continuously monitoring objects being used in everyday life.
We developed a system that can record object movement continuously and deployed
them in homes of three elderly people for over two weeks. We explored the use
of probabilistic topic models to analyze the collected data and identify common
patterns
Towards automatic estimation of conversation floors within F-formations
The detection of free-standing conversing groups has received significant
attention in recent years. In the absence of a formal definition, most studies
operationalize the notion of a conversation group either through a spatial or a
temporal lens. Spatially, the most commonly used representation is the
F-formation, defined by social scientists as the configuration in which people
arrange themselves to sustain an interaction. However, the use of this
representation is often accompanied with the simplifying assumption that a
single conversation occurs within an F-formation. Temporally, various
categories have been used to organize conversational units; these include,
among others, turn, topic, and floor. Some of these concepts are hard to define
objectively by themselves. The present work constitutes an initial exploration
into unifying these perspectives by primarily posing the question: can we use
the observation of simultaneous speaker turns to infer whether multiple
conversation floors exist within an F-formation? We motivate a metric for the
existence of distinct conversation floors based on simultaneous speaker turns,
and provide an analysis using this metric to characterize conversations across
F-formations of varying cardinality. We contribute two key findings: firstly,
at the average speaking turn duration of about two seconds for humans, there is
evidence for the existence of multiple floors within an F-formation; and
secondly, an increase in the cardinality of an F-formation correlates with a
decrease in duration of simultaneous speaking turns.Comment: 8th International Conference on Affective Computing & Intelligent
Interaction EMERGent Workshop, 7 pages, 4 Figures, 2 Table
Who is where? Matching people in video to wearable acceleration during crowded mingling events
ConferenciaWe address the challenging problem of associating acceler-
ation data from a wearable sensor with the corresponding
spatio-temporal region of a person in video during crowded
mingling scenarios. This is an important rst step for multi-
sensor behavior analysis using these two modalities. Clearly,
as the numbers of people in a scene increases, there is also
a need to robustly and automatically associate a region of
the video with each person's device. We propose a hierarchi-
cal association approach which exploits the spatial context
of the scene, outperforming the state-of-the-art approaches
signi cantly. Moreover, we present experiments on match-
ing from 3 to more than 130 acceleration and video streams
which, to our knowledge, is signi cantly larger than prior
works where only up to 5 device streams are associated
A Modular Approach for Synchronized Wireless Multimodal Multisensor Data Acquisition in Highly Dynamic Social Settings
Existing data acquisition literature for human behavior research provides
wired solutions, mainly for controlled laboratory setups. In uncontrolled
free-standing conversation settings, where participants are free to walk
around, these solutions are unsuitable. While wireless solutions are employed
in the broadcasting industry, they can be prohibitively expensive. In this
work, we propose a modular and cost-effective wireless approach for
synchronized multisensor data acquisition of social human behavior. Our core
idea involves a cost-accuracy trade-off by using Network Time Protocol (NTP) as
a source reference for all sensors. While commonly used as a reference in
ubiquitous computing, NTP is widely considered to be insufficiently accurate as
a reference for video applications, where Precision Time Protocol (PTP) or
Global Positioning System (GPS) based references are preferred. We argue and
show, however, that the latency introduced by using NTP as a source reference
is adequate for human behavior research, and the subsequent cost and modularity
benefits are a desirable trade-off for applications in this domain. We also
describe one instantiation of the approach deployed in a real-world experiment
to demonstrate the practicality of our setup in-the-wild.Comment: 9 pages, 8 figures, Proceedings of the 28th ACM International
Conference on Multimedia (MM '20), October 12--16, 2020, Seattle, WA, USA.
First two authors contributed equall
Estimating self-assessed personality from body movements and proximity in crowded mingling scenarios
ArtículoThis paper focuses on the automatic classi cation of self-
assessed personality traits from the HEXACO inventory du-
ring crowded mingle scenarios. We exploit acceleration and
proximity data from a wearable device hung around the
neck. Unlike most state-of-the-art studies, addressing per-
sonality estimation during mingle scenarios provides a cha-
llenging social context as people interact dynamically and
freely in a face-to-face setting. While many former studies
use audio to extract speech-related features, we present a
novel method of extracting an individual's speaking status
from a single body worn triaxial accelerometer which scales
easily to large populations. Moreover, by fusing both speech
and movement energy related cues from just acceleration,
our experimental results show improvements on the estima-
tion of Humility over features extracted from a single behav-
ioral modality. We validated our method on 71 participants
where we obtained an accuracy of 69% for Honesty, Consci-
entiousness and Openness to Experience. To our knowledge,
this is the largest validation of personality estimation carried
out in such a social context with simple wearable sensors
No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Recognizing who is speaking in a crowded scene is a key challenge towards the
understanding of the social interactions going on within. Detecting speaking
status from body movement alone opens the door for the analysis of social
scenes in which personal audio is not obtainable. Video and wearable sensors
make it possible recognize speaking in an unobtrusive, privacy-preserving way.
When considering the video modality, in action recognition problems, a bounding
box is traditionally used to localize and segment out the target subject, to
then recognize the action taking place within it. However, cross-contamination,
occlusion, and the articulated nature of the human body, make this approach
challenging in a crowded scene. Here, we leverage articulated body poses for
subject localization and in the subsequent speech detection stage. We show that
the selection of local features around pose keypoints has a positive effect on
generalization performance while also significantly reducing the number of
local features considered, making for a more efficient method. Using two
in-the-wild datasets with different viewpoints of subjects, we investigate the
role of cross-contamination in this effect. We additionally make use of
acceleration measured through wearable sensors for the same task, and present a
multimodal approach combining both methods
- …